Genome wide study of cysteine rich receptor like proteins in Gossypium sp.

Cysteine-rich receptor-like-kinases (CRKs), a transmembrane subfamily of receptor-like kinase, play crucial roles in plant adaptation. As such cotton is the major source of fiber for the textile industry, but environmental stresses are limiting its growth and production. Here, we have performed a deep computational analysis of CRKs in five Gossypium species, including G. arboreum (60 genes), G. raimondii (74 genes), G. herbaceum (65 genes), G. hirsutum (118 genes), and G. barbadense (120 genes). All identified CRKs were classified into 11 major classes and 43 subclasses with the finding of several novel CRK-associated domains including ALMT, FUSC_2, Cript, FYVE, and Pkinase. Of these, DUF26_DUF26_Pkinase_Tyr was common and had elevated expression under different biotic and abiotic stresses. Moreover, the 35 land plants comparison identified several new CRKs domain-architectures. Likewise, several SNPs and InDels were observed in CLCuD resistant G. hirsutum. The miRNA target side prediction and their expression profiling in different tissues predicted miR172 as a major CRK regulating miR. The expression profiling of CRKs identified multiple clusters with co-expression under certain stress conditions. The expression analysis under CLCuD highlighted the role of GhCRK057, GhCRK059, GhCRK058, and GhCRK081 in resistant accession. Overall, these results provided primary data for future potential functional analysis as well as a reference study for other agronomically important crops.

Conserved domain architecture was carried out to find duplicated domains and additional associated domains with stress-antifungal motifs 37 protocol. The predicted domains were arranged at their specific site on amino acid sequence using the Perl program. Three different classification methods were implemented in this study; (1) types and location (N-terminal or C-terminal) of additional domains associated with the Stress-antifungal/DUF 206/ Gnk2 domain with irrespective of duplicated domains, (2) complete domains, and (3) literature classifications e.g. sdCRRSP, ddCRRSP, PDLP, sdCRK, tdCRK and qdCRK 15 . We also included 35 land plants including mosses, bryophyte, gymnosperm, and angiosperm for evolutionary study of CRKs.
Protein statistics, chromosomal mapping, intron-exon distribution, and motif analysis. All gene and proteins associated data were retrieved from Cottongen, including protein length, molecular weight (kDa), charges, grand average of hydropathy, isoelectric point (Ip), chromosome start, and end. The chromosomal mapping was carried with TBTools gene location, intron-exon distribution generated with gene display server 38 , structural and functional motifs were detected with the MEME motif, and PROSIT Motifs discovery server 39,40 . Evolution and diversity analysis of CRKs in Gossypium. An advanced comparative genomics tool, OrthoFinder 41 was exploited to understand evolution and diversity in CRK proteins among five species. An additional DIAMOND tool was used for fast sequence similarity searches 42 . The graph clustering was done with the MCL clustering algorithm 43 . The gene tree inference and a distance matrix of the orthogroups were constructed with DendroBLAST 44 . A distance-based phylogeny tree was constructed using FastME 2.0 45 . For multiple sequence alignment, MAFTT 7.0 was used 46 . The maximum likelihood phylogenetic tree of large alignment was constructed using FastTreeMP 47 . To construct the Circos plot of five genomes, a BlastP program was used to determine the linkage and the circular plot was constructed with Advance Circos plot packages in TBTools. www.nature.com/scientificreports/ Expression profiling of CRKs genes. The expression profiling data is divided into three categories, i.e., tissue-specific (leaf, stem, root, ovule, etc.) expression, abiotic stress-specific (cold, heat, salinity, drought) expression, and biotic stress-specific CLCuD (Cotton leaf curl virus disease). To determine the expression profiling, the publicly available RNA-seq (SRP044705, SRP042128, SRP017168, SRP001603, SRP009820, and SRP027533) at CottonFGD 48 , whitefly infestation on CLCuD susceptible accession of G. hirsutum (SAMN07519654, SAMN07519653, SAMN07519652, SAMN07519651, SAMN07519650, SAMN07519649) 49 and whitefly infestation on CLCuD resistant accession of G. hirsutum (SAMN07251316 and SAMN07251315) 50 were used. The transcript level was calculated in fragments per kilo base per million (FPKM) by RNA-seq data pipelines. The gene expression clustering was carried using TBTools with parameters; log2 base, column cluster, and row cluster.
Protein-protein interaction network and host-pathogen model docking. The CRKs proteinprotein interaction network was generated using a STRING server with G. raimondii proteome interaction background. Begomovirus, a genus of the Geminiviridae family, also known as plant virus, infects a wide range of dicotyledonous plants. In cotton plants, it causes CLCuD. So, we included all CLCuD viral proteins (AV1, AV2, AC3, AC2, AC1, AC4, and C5) for their possible interaction with cotton CRKs. The sequence-based interaction was predicted using the Host-Pathogen Interaction predictor (HOPITOR) 51 . The 3D structure of CRKs and viral proteins were predicted using the I-TASSER server 52 . The host-pathogen protein docking was carried with ZDOCK 53 . The protein complex was visualized with discovery stadio 54 and the active sites and interactive bonds were presented with Ligplot + 55 . Target site prediction and expression profiling of miRNA. To find the miRNA target site in CRK coding sequences of Gossypium sp. mature miRNA sequences were retrieved from the Plant non-coding RNA database 57 and PmiREN 58 . These downloaded miRNA and CDS sequences of Gossypium sp. were used as input data in psRNA target: a plant small RNA target analysis 59 .

SNPs and InDels determination in
The expression profiling of CRK-targeted miRNA was assessed using the miRNA-seq of Gossypium sp. data located at PmiREN (Plant Micro RNA Encyclopedia ) 58 covering different tissues including anther, fiber, embryogenic, hypocotyl, leaf, ovule, root, shoot apical, stem, and apexes.

Plant growth and CLCuD stress.
To validate the RNA-seq data of GhCRKs, we selected two G. hirsutum accessions, Mac7 (resistant to CLCuD) and Coker 312 (susceptible to CLCuD). A set of 20 plants was sown in the glasshouse for each accession. After five weeks of post-germination, one set of each accession (10 plants) was transplanted in the net house to expose the whitefly (CLCuD career vector). After two weeks of post-transplantation, a high population of whitefly was seen on Coker 312's as well as Mac7 plants. All Coker 312 were 100% infected with severe symptoms, while no symptom was found in Mac7 plants.
RNA extraction and real-time quantitative PCR analysis. Young leaves were collected from net house and glasshouse. Total RNA was extracted through the Trizol method 60 , and treated with RNase-free DNase (Promega, USA). The quality was assessed by gel electrophoresis. A 12μL sample with 100 ng/μL concentration, converted into cDNA using RevertAid Hminus First Strand cDNA Synthesis Kit (Thermo Scientific).
Based on the results of biotic stress expression profiling and host-pathogen protein interaction of GhCRKs, a set of genes was selected for qPCR analysis, and gene-specific primers were designed. Real-time PCR was performed using a Bio-Rad iCycler Thermal Cycler iQ5 and DNA Master SYBR Green I kit (Roche, Basel, Switzerland). Reactions were carried out in triplicate and each replicate consisted of 2 μL of cDNA (with concentration of100ng/μL), 0.5μL of each primer (concentration 10 μM/μL) and 5 μL SYBR Green Master Mix, making a final volume of 12 μL reaction. The PCR reactions were carried out using the following conditions: the initial temperature at 95 °C for 5 min, followed by 35 cycles of 95 °C for the 30 s, 58 °C for 30 s, and 72 °C for 1 min. Each biological sample was used in triplicates and the average expression value was calculated. The data ware normalized with the largest value of each panel making the highest relative expression as one.

Results
Gene organization of CRKs in diploid and tetraploid cotton species. The genome-wide analysis identified a total of 60, 74, 65, 120, and 118 CRK genes in Gar, Gra, Gher, Gbar, and Ghir, respectively (Tables S1-S5). The protein features were presented for each species, including protein length, molecular weight, charge, isoelectric point, and grand average of hydropathy. A summary of these features showed that the longest proteins sequence comprises 884 amino acids, while the shortest was composed of only 127 amino acid residues. Likewise, the molecular weight range was observed between 97.675 and 14.55 kDa, whereas the protein charge ranges between + 23.5 and − 15. Moreover, the isoelectric point ranged from 9.229 to 4.426, while the grand  Table S6). Overall, we found that most of the genes were localized on Chr6, Chr10, and Chr11 in all five species, representing their common locus the genomes.
Domain architecture, classification, and phylogenetic analysis of cotton CRKs. To provide a comprehensive study, we have introduced two new classification methods in cotton CRKs. The first was based on the type of domains presence and absence. In this classification system, all identified CRK genes were divided into 11 major classes. Through this classification, we have identified many Stress-antifungal (DUF 26)-associated functional domains such as ALMT, FUSC_2, Cript, DUF3403, FYVE, TauE, and Pkinase-tyr. Of these 11 classes, class VI (Stress-antifungal-Pkinase-tyr) has the largest number of genes, followed by class IV and VII. Class VI was the most commonly found class in all species with 73, 69, 36, 28, and 37 genes in Ghir, Gbar, Gar, Gher, and Gra, respectively. In contrast to shared classes, several species-specific classes (e.g. class_II only found in Gher, class_II only in Gra, class_IX only in Ghir, and class_X in Gbar) were also observed ( Fig. 2, Table 1, Table S7). The second classification method included the number of duplicated domains in addition to domain presence and absence. These classes were named as a subclass of CRK genes in Gossypium sp. All 437 genes were distributed into 43 sub-classes (I-XXXIII classes). The highest number of genes were observed in subclass IX (162 genes) with domain architecture Stress-antifungal__Stress-antifungal__DUF3403 and VII subclass (116 genes) with domains; Stress-antifungal__Stress-antifungal. These two classes were commonly found in all five species with the highest number of genes compared to other subclasses. In contrast to common domains architectures, unique and species-specific domain architectures were also observed (Fig. S2, Tables S8-S9).
The accumulative phylogenetic tree of all five species is divided into several major and minor clades. However, we did not observe any species-specific clade showing species diversity in the CRKs among Gossypium sp. The whole phylogenetic tree was divided into 51 subclades/clusters (clus), and of these, clus-XLVII had maximum genes with 15 CRKs, followed by clus-V, clus-V, clus-XXIII, and clus-VII with 14 CRKs. Similarly, clus-I, clus-XIV, and clus-XLIII consist of 13 CRKs and so on. Most of the clades possessed a range of 7 to 14 CRKs. The number of clades was related to subclasses of cotton CRKs (Fig. S3). Taken together, these CRK classifications identified several novel classes of species-specific and common members.

Evolutionary study of cotton CRKs with land plants.
For the evolutionary study of Gossypium sp.
with land plants, we included 2,026 CRKs from 35 plants, including mosses, bryophyte, gymnosperm, and angiosperm 15 (Table S10). Of these land plants, Chlamydomonas reinhardtii, Coccomyxa subellipsoidea, Micromonas pusilla, Ostreococcus lucimarinus, Volvox carteri did not possess any CRK encoding genes. The identified CRKs were screened through Pfam and revised the steps mentioned for cotton classification. The conserved domain-pattern-based classification identified a total of 19 different patterns, thus classified into 19 subclasses (Table 2). Of these nineteen classes, class IV (ddCRRSP) and class X (ddCRK) were more common in all higher plants, while lower plants e.g., Marchantia polymorpha (liverwort) had only class I (sdCRRSP) followed www.nature.com/scientificreports/ by Selaginella moellendorffii (lycophyte) that possessed class I, class VIII (sdCRK) and class X (ddCRK). The Gossypium sp. showed several genus-specific classes including sdCRRSPdS, ddCRRSPD, tdCRRSP, sdCRKD, ddCRKF, ddCRKS and qdCRKD. These classes are only found in cotton species showing their diversity with other dicot plants (Fig. 3, Table S11).
The comparative genomics summarized that all CRKs from 5 species of cotton plants were divided into 52 orthogroups covering 416 genes (95.2% of genes in orthogroups) with only 21 unassigned genes (4.8% of genes). Of these, 40 orthogroups were shared by all five species, while none of the orthogroups were species-specific. Overall, the mean and medians were recorded as 8 and 7 orthogroups, respectively. At the species level, however, we observed that G. arboreum shared more orthologs with G. barbadense (92 orthologs) and G. hirsutum (89 orthologs) as compared to G. raimondii (58 orthologs) and G. herbaceum (57 orthologs). Similarly, G. barbadense shared a higher number of orthologs with G. hirsutum (100 orthologs), followed by G. raimondii (66 orthologs), G. arboreum (56 orthologs), and G. herbaceum (54 orthologs) ( Fig. 4A-C).  www.nature.com/scientificreports/ The species-wise orthogroups distribution represented that most of the identified genes belonged to one of 52 orthogroups, i.e., G. arboreum (84.6% of total GaCRKs), G. raimondii (88.5% of total GrCRKs),) and G. herbaceum (86.5% of total GheCRKs), G. barbadense (84% of total GbCRKs), and G. hirsutum (96.2% of total GhCRKs). The species-wise relationship demonstrated remarkable relation among five species. We also drew one-to-one, one-to-many, many-to-one, and many-to-many species relationships. We concluded that a small number of genes contributed in one-to-one and many to one, while a higher number of genes showed one-tomany and many-to-many with a concluding close relationship of five species (Fig. 4D,E).
The species-wise phylogenetic tree with Theobroma cacao (T. cacao) as an outgroup, demonstrated that G. arboreum (A2-genome) has close relation with G. hirsutum, followed by Gra (D-genome). The gene duplication event was also predicted at all internal and terminal nodes. N 0 node represented the common ancestor of all cotton species with T. cacao, demonstrated 69 duplications with 100% confidence and 40 duplications with 50% confidence. Emerging from the N 0 node, T. cacao gained 16 duplications, Gbar gained 22, Gra gained 12 and Ghir gained 24 duplication events. While the Gher (A1-genome) and Gar (A2 genome) did not show any duplication events. (Fig. 4F,G). The chromosomal location collinearity suggested that the Ghir has more syntenic blocks within its subgenome (A and D genomes) followed by Gar (A-genome) while the G. barbadence showed more collinearity lines with Gher (A1-genome) and Gra (D-genome) (Fig. 4H,I).
The RNA-seq data analysis of CRK genes in G. arboreum showed distinctive expression patterns in diverse tissues (leaf, stem, and root) at different time intervals (10 DPA, 15 DPA, and 20 DPA). Some genes, including GaCRK02, GaCRK09, GaCRK52, and GaCRK29, showed increased transcript levels in the leaf than stem or root. Similarly, several genes (GaCRK02, GaCRK24, GaCRK03, and GaCRK38) had elevated transcript levels in the stem, while others exhibited higher mRNA levels in the root (GaCRK24, GaCRK02, GaCRK43, and GaCRK07). In addition, the expression profiling in the ovule at 10 DPA, 15 DPA, and 20 DPA demonstrated remarkable differences in the expression of CRK genes. For instance, in ovule development at 15 DPA, GaCRK03, GaCRK02, and GaCRK38 showed the highest expression levels, and this set of genes also showed similar results at 15 DPA and 20 DPA in ovule and fiber development (Fig. S5A, Table S13). In summary, most of the genes demonstrated tissue-specific expression, However, GaCRK02 showed high transcripts in all tissues at different time intervals. www.nature.com/scientificreports/ The G. raimondii RNA-seq analysis of CRKs in different tissues such as seed, fiber, ovule, and leaf at different time intervals (10 DPA, 20 DPA, 30 DPA, and 40 DPA) was also presented. Results demonstrated that GrCRK11, GrCRK67, and GrCRK52 showed higher transcript levels in seed germination at 10 DPA in comparison with 20 DPA, 30 DPA, and 40 DPA data. Similarly, some genes (GrCRK67, GrCRK21, GrCRK24, and GrCRK42) exhibited increased transcript levels at 20 DPA, while others showed higher mRNA levels at 30 DPA (GrCRK21, GrCRK67, GrCRK24, and GrCRK42) and 40 DPA (GrCRK63, GrCRK21, GrCRK11, and GrCRK67). Similar expression patterns were also observed in fiber development at 10 DPA (GrCRK24, GrCRK11, GrCRK46, and GrCRK18) and 20 DPA (GrCRK06, GrCRK21, GrCRK67, and GrCRK24) (Fig. S5B, Table S14). Like the G. arboreum CRKs expressions, the G. raimondii CRKs also demonstrated clusters of genes co-expressing in different tissues. However, GrCRK67 showed its putative role in all tissues.

Protein-protein interaction network and host-pathogen interaction. The protein-protein inter-
action network of GhCRKs provided endogenous protein interactions including, experimentally determined interactions, gene fusion, gene co-occurrence, co-expression, and protein homology. Of 118 GhCRKs, only a few proteins showed internal interactions. For instance, GhCRK067 has the highest number of interactions including GhCRK067-GhCRK025 and GhCRK67-GhCRK048 possessing experimentally validated interactions, GhCRK067-GhCRK028 complex has three types of correlation i.e. Co-expression, protein homology, and text mining. Similar interactions were also observed in the GhCRK084-GhCRK028 complex (Fig. S6). We selected ten genes for further host-pathogen interaction analysis based on the differential expression of CRKs in Mac7 and NIAB-Karishma under CLCuD stress. The protein-protein interaction probability analysis demonstrated strong interaction of most of the up-regulated genes in Mac7 to the Begomovirus protein, including GhCRK082 (strongly interaction probability with AC1, AC2, AC3, AC4, AV2, and C5) and GhCRK087 (strongly interaction probability with AC1, AC2, AC3, and AV2) had a probability value of more than 0.9. In comparison, almost all other genes had greater than 0.5 values, which is significant for protein-protein interaction (Fig. S7, Table S22). Furthermore, the sequence-based interactions of GhCRKs with viral proteins were also demonstrated with hostpathogen protein-protein interaction with ZDOCK molecular docking. The 3D host-pathogen protein docking analysis demonstrated the interaction network between host and pathogen amino acidic residues. The upregulated genes and their interaction with CLCuD viral proteins confirmed their direct interaction. The GhCRK059 www.nature.com/scientificreports/ and AC2 interaction provided the active residues and their bonding types. As such, the GhCRK059 protein (A chain) with residues Leu 23 , Arg 22 , Ser 24, and His 25 interacted with AC2 protein (Q chain) at Cys 36 , Asp 110, and Ser 39 through hydrogen bond (green lines) and salt bridge (red lines). Similar results were observed in GhCRK087-AC3, GhCRK087-AV2, GhCRK082-AC2, GhCRK082-AV2, and GhCRK082-AC3 complexes. However, the number of bonds and types of bonds varied from complex to complex e.g., the highest number of interactions was found in GhCRK082-AC3, followed by GhCRK082-AC2 (Fig. 6, Fig. S8).

SNPs and InDels variants in CRKs of resistant G. hirsutum. Mac7 is a tolerant G. hirsutum accession,
which is developed by USDA by breeding program. To find genetic variation and transcriptomics variation, we have identified SNPs and InDels associated with CRKs in Mac7 and Coker 312. The genome-wide-genetic variation in Mac7 identified a total of 192 and 208 genes having SNPs and InDels concerning TM-1 reference G. hirsutum genome, respectively. Similarly, a total of 62 and 192 genes with SNPs and InDels were found in Coker   (Table S23). In addition to genomic region-based variants, we also categorized variants into impact-based levels e.g. high, low, moderate, and modifier. In the case of variants by impacts, a total of 12 and 23 genes showed high impact SNPs and InDels in Mac7, respectively. While only 3 SNPs and 17 InDels associated genes were found in Coker 312 under the same variant impact level (Table S24). In summary, we have observed many differences in CRK gene sequences of the Mac7 and Coker 312 (Fig. S9). There were several unique SNPs and InDels in CRK genes of Mac7 accessions which could be the source of resistance to different biotic and abiotic stresses.

Expression profiling and RT-PCR analysis of GhCRKs under CLCuD in resistant and susceptible
G. hirsutum. We used RNA-seq experimental data from Mac7 (G. hirsutum accession, resistant to CLCuD) 50 and NIAB-Karishma (a mutant Coker 312 G. hirsutum, highly susceptible to CLCuD) 49 for expression analysis (Fig. S12). The RNA-seq data include viruliferous whitefly infestation for cotton leaf curl virus disease (CLCuD), Pakistan's threat to cotton production. The expression profiling and comparison demonstrated that 86 CRK genes and 32 genes are expressed in Mac7 and NIAB-Karishma, respectively. The comparative study revealed that most CRK genes showed increased expression in resistant (Mac7) than susceptible NIAB-Karishma. For instance, GhCRK026, GhCRK013, GhCRK007, GhCRK116, GhCRK108, GhCRK099, GhCRK082, GhCRK072, and GhCRK096 were differentially upregulated in resistance under CLCuD disease treatment (Fig. S13, Table S33). The quantitative real-time expression analysis of selected genes displayed intriguing findings. Of the nine selected genes (Table S34), only three genes (GhCRK093, GhCRK82, and GhCRK096) showed some level of www.nature.com/scientificreports/ expression in Coker 312 (susceptible accession) and their transcripts level decreased when treated with CLCuD. However, all 9 genes showed increased expression in Mac7 (resistant accession) in control as well as infested sample, and some genes (GhCRK059, GhCRK081, and GhCRK087) were upregulated under CLCuD (Fig. 8).
Generally, we concluded that the CRK genes in Mac7 and Coker are shown to be involved in CLCuD stress.

Discussion
Cotton (Gossypium sp.) is a worldwide economical crop that produces raw fiber and seed oil for the textile and oil industries. But the environmental stresses including biotic (insect, pest, virus, and bacteria) and abiotic (drought, salinity, heat, and cold) are limiting its growth and yield. Thus, the improvement of resistant genetic makeup is essential for high-quality cotton production. CLCuD is one of the major biotic stresses in Asia such as Pakistan and India and this deadly virus decreases cotton yield several-fold every year. Therefore, identification, characterization, and functional analysis of stress-responsive genes are the top targets of cotton researchers. Several genome-wide association studies have been conducted to find important genes involved in different agronomical traits such as fiber yield and improvement, gossypol content, drought, and salt stress-resistant. Similarly, many important resistant QTLs and markers against different abiotic and biotic stresses have been discovered 63 . www.nature.com/scientificreports/ However, very little is known about the CLCuD resistant mechanism. The current study also provided important data for further functional analysis against CLCuD. The biotic stress signal in plants observed by pattern recognition receptors (PRRs) also plays a vital role in activating plants' immunity. PAMPs are the biotic stress signals that activate a combination of immune receptors complexes and plant immune response signaling pathways 64 . The pathways that respond to early immune signaling are kinase and transcriptional gene regulation 65 . Antifungal proteins are also called CRKs (Cysteine (C)-rich receptor-like kinases gene) and DUF26 or Gnk-2. CRKs are an important class of receptor-like kinase (RLKs) that play vital roles in disease resistance in plants. Despite the known role of CRKs in plant resistance 17,21,22,[66][67][68] , there is a big gap in a genome-wide comparative study of CRKs in Gossypium species, which was covered in this study. The genomic analysis effectively transfers knowledge from one taxon to another, allowing for a faster pace of gene discoveries associated with disease resistance.
The current study identified CRKs in five Gossypium sp. and classified them according to domains architectures. A total of 437 CRK genes have stress-antifungal domains in G. arboreum (60 genes), G. raimondii (74 genes), G. herbaceum (65 genes), G. hirsutum (118 genes), and G. barbadense (120 genes). A similar study of CRK was also conducted by Ting et al. and Hussain et al. 37,69 , but several gaps were filled by this study as we identified an increased number of CRK genes compared to the previous studies. In addition, several additional www.nature.com/scientificreports/ bioinformatics and expression analyses were also added in the current work. The number of CRKs in tetraploids was not more than two-fold of diploid species that may be due to gain and loss of the gene during polyploidization of two diploid genomes (A and D genome) to make the tetraploid genomes (AA t DD t ) 70 . Furthermore, the segmented and tandem duplications influence the development of multiple CRKs gene families, including the RLKs family 71 . The gene structure analysis identified exon, intron, and UTR regions of the genes. The length of intron and exon is related to the phylogenetic tree's construction, and it is particularly high in G. raimondii 72 .
The comparative analysis showed both differences and similarities in the exon number that might be related to their function and conservation 73 . Chromosomal mapping of CRKs showed their abundance on a few chromosomes like Chr6, Chr10, and Chr11 in all five species. Similar chromosomal mapping was also reported by Ting et al. 69 . The region behind the higher density of genes on some species might be due to segmental duplications that occurred between non-allelic chromosomes in cotton. Furthermore, the DUF26 containing genes normally show tandem duplication 74 .
In this study, we have classified CRKs in all five species in three different ways, conserved domains absence and presence (DAP), conserved domain repeats (DR), 3rd classification based on Aleksia et al. 15 . Based on the first and second classifications, we have identified several DUF26 associated decoy domains, like ALMT (aluminum-activated malate transporter domain) 75 , FusC_2 (Fusaric acid resistance protein-like domain) 76 , Cript (Microtubule-associated protein domain), FYVE (evolutionarily conserved double-zinc-binding domain) which are functionally well-characterized protein domains and are involved in diver biological function including efflux of organic acids 77 , salt stress tolerance and the regulation of malic acid content 78 , linkage of mRNA transporter to endosome trafficking 79 . Such resistance decoy domains also reported in Gossypium 37 may provide additional features to CRK genes for plants adaptation. Molecular phylogenetic analysis and OrthoFinder results suggested a significantly divergent evolutionary history of CRK genes in five species. The species-based phylogenetic tree of CRKs suggested G. arboreum as the ancestor of other Gossypium sp. and the othogroups also found some species-specific groups. However, the five species showed a close relationship in sequence similarity, possibly due to the origination of CRKs from common ancestors. The evolution of genes is mediated by sequence exchange, tandem or segmental duplication events, or gene conversion [80][81][82] .
A single-nucleotide polymorphism (SNPs) is the simplest form of genetic variation among individuals that can prompt minor changes in phenotypic, physiological, and biochemical characteristics. These mutations in the gene sequence alter the amino acid sequence, which may change the function of the gene. Several SNPs were identified and used as a genetic marker for the identification of qualitative trait loci (QTLs) associated with multiple agronomic features of cotton including fiber quality and quantity, resistance to biotic and abiotic stresses [83][84][85][86][87] . However, very little is known about SNPs associated with biotic stress resistance. So, we also have found several SNPs in CRK genes of resistant accession of G. hirsutum in comparison with the Coker 312 (highly susceptible) and TM-1 reference genome. We suggest that these SNPs may have a significant role in plant adaptation under CLCuD. However, further experimental validation is required to confirm their role as selection markers.
miRNA contains 17 to 24 nucleotides (nt) and is an important gene regulatory factor in plants 88 . miRNAs take part in diverse biological functions of plants at different transcriptional and translational levels [88][89][90][91] . They also play essential roles in developing immunity against pathogens succeeding the endogenous defense-related genes and down-regulating the pathogens of the exogenous viral plants [92][93][94] . The miRNA target sites in CRKs provide essential primary data for understanding CRKs regulation under various stresses. In cotton CRKs, we found different miRNA families in diverse cotton species such as miR172, miR1373, miR169, and miR164 in GaCRKs, whereas miR172, miR159, miR169, and miR397 in GhCRKs. The genotype-depended response of miRNA to biotic and abiotic stresses varies from cultivar to cultivar 95 . Another study of miRNA microarray in cotton found high expression of miR156, miR169, miR535, and miR827 under salinity treatment 96 . The miR172b-SSR was used as a biomarker for identifying the different responses of rice cultivars under salt stress 97 . In agreement with the literature, several miRNA families were identified which could target CLCuDV genes with perfect and near-perfect complementarity 98 . In addition, the miR172 initiates floral growth and modifies reproductive growth and the co-regulation of miR156 and miR172 in the origination and improvement of cotton plants 99 . So, the identification of miRNAs in cotton CRKs would be helpful for understanding the post-transcriptional regulation of CRKs under diverse stress conditions. We also summarize based on the miRNA target site prediction and expression profiling of miRNA, there might a strong correlation between miRNA expression and functional regulation of cotton CRK genes. As we have observed that CRKs have the highest target side for the miR172 family and in the meanwhile the miR172 family has high expression in most of the tissue, demonstrating its role in the regulation of various biological mechanisms.
The difference in the number of genes and classes among 35 plants species is due to the expansion of CRK families through small-scale duplication, genome fractionation, and genetic drift which cause due to whole genome multiplications 16 . One prediction suggests that the duplicated genes under dosage balance exhibit fewer expressions than other duplicates and RLKs can function not only in defense but also in development and abiotic stress responses 100,101 . Hence, to determine the functional conservation and putative role of CRKs in cotton development and adoption under stresses, publicly available expression data of tissues and stress treatments were used and highlighted several tissue-specific and stress-specific expressions of CRKs in cotton. For instance, a cluster of CRKs; GhCRK053, GhCRK083, GhCRK094, GhCRK038, GhCRK110, GhCRK039, GhCRK041, GhCRK068, GhCRK093, and GhCRK013, induced transcriptionally under salt, drought, heat, and cold stresses in G. hirsutum. Previous studies reported that AtCRKs are transcriptionally induced under abiotic stresses such as salt, drought, UV light, heat, salicylic acid [17][18][19][20][21] . In agreement with such previous studies of CRKs in Arabidopsis, we also found important several stress-responsive cotton CRKs. In addition to abiotic stresses, we also reported the important cotton CRKs response under biotic stresses including cotton leaf curl disease for instance the G. hirsutum CRKs; GhCRK026, GhCRK013, GhCRK007, GhCRK116, GhCRK108, GhCRK099, GhCRK082, and GhCRK072 differentially upregulated in resistant accession while down-regulated in susceptible accession under CLCuD www.nature.com/scientificreports/ disease treatment. The genetic variants analysis also identified several SNPs and InDels in these putative genes. Furthermore, the quantitative real-time expression analysis also validated the RNA-seq analysis of GhCRKs. All selected GhCRKs were transcriptionally upregulated in resistant accession while either down-regulated or did not show any transcript in the susceptible accession. In literature, it is reported that a subset of CRKs is strongly induced in response to pathogens and PAMPs treatments 19,20 and overexpression of AtCRK4, AtCRK5, AtCRK6 , AtCRK13, and AtCRK36 showed enhanced resistance to the bacterial pathogen Pseudomonas syringae as well as also activated the early and late PTI responses 17,21,22 . Henceforth, the comparative expression of CRKs in resistant and highly susceptible cotton provided important CRKs candidates. The coordination of these CRKs during plant immune response suggested that they cooperate in plant defense signaling. Furthermore, the molecular docking of CRKs with CLCuD viral proteins also demonstrated their direct interaction. As CRK proteins possess extracellular domains, which are involved in protein-protein interaction and signal perceptions 12 , the transmembrane domains and intracellular domains transduce and activate MAPK pathways for activation of the plant. The host-pathogen interaction and expression data showed coordination of these putative genes in plant immune signaling. Thus, the GhCRK057, GhCRK059, GhCRK058, GhCRK081, GhCRK008, and GhCKR087 might be a potential marker for CLCuD resistant genotype. The current study provided a deep insight into CRKs in Gossypium sp. The different ploidy level of Gossypium species has different resistance level, for instance, the diploid species like G. arboreum, G. raimondii, and G. herbaceum are naturally resistant to several biotic and abiotic stresses while the tetraploid cotton-like G. hirsutum is susceptible to multiple stresses and the G. barbadense is highly susceptible to environmental stresses. Thus, the comparative study of stress-responsive genes CRKs in cotton is essential for improving cotton growth and development.

Conclusion
The current study identified a total of 437 Cysteine-rich receptor-like kinases (CRKs) encoding genes in five Gossypium sp. The structural and domain-based classification identified several novel domain architectures in Gossypium sp. The genome mapping and genetic diversity (SNPs and InDels) provided important data for cotton breeders and the expression profiling under different environmental stresses and their validation through qPCR under CLCuD demonstrated a putative role in cotton growth and development. The miRNA target site prediction will help to understand the regulation of CRKs in specific tissues. We have provided detailed computational and experimental studies on CRKs in the five species; however, further individual gene functional analysis is required to understand the CRKs mechanism in cotton plants' adaptation.

Data availability
Source data for all the graphs included in this paper are available as Supplementary Data in Excel format. All other data are available from the corresponding author upon reasonable request. It is also stated that there are no ethical issues that required permissions or licenses to complete this work.